A dataset of ground-dwelling nocturnal fauna for object detection and classification

The exploration of ground-dwelling nocturnal fauna represents a significant challenge due to its broad implications across various sectors, including pesticide management, crop yield forecasting, and plant disease identification. This paper unveils an annotated dataset, BioAuxdataset, aimed at facilitating the recognition of such fauna through field images gathered across multiple years. Culled from a collection exceeding 100,000 raw field images over a span of four years, this meticulously curated dataset features seven prevalent species of nocturnal ground-dwelling fauna: carabid, mouse, opilion, slug, shrew, small-slug, and worm. In instances of underrepresented species within the dataset, we have implemented straightforward yet potent image augmentation techniques to enhance data quality. BioAuxdataset stands as a valuable resource for the detection and identification of these organisms, leveraging the power of deep learning algorithms to unlock new potentials in ecological research and beyond. This dataset not only enriches the academic discourse but also opens up avenues for practical applications in agriculture, environmental science, and biodiversity conservation.


a b s t r a c t
The exploration of ground-dwelling nocturnal fauna represents a significant challenge due to its broad implications across various sectors, including pesticide management, crop yield forecasting, and plant disease identification.This paper unveils an annotated dataset, BioAuxdataset, aimed at facilitating the recognition of such fauna through field images gathered across multiple years.Culled from a collection exceeding 10 0,0 0 0 raw field images over a span of four years, this meticulously curated dataset features seven prevalent species of nocturnal ground-dwelling fauna: carabid, mouse, opilion, slug, shrew, small-slug, and worm.In instances of underrepresented species within the dataset, we have implemented straightforward yet potent image augmentation techniques to enhance data quality.BioAuxdataset stands as a valuable resource for the detection and identification of these organisms, leveraging the power of deep learning algorithms to unlock new potentials in ecological research and beyond.This dataset not only enriches the academic discourse but also opens up avenues for practical applications in agriculture, environmental science, and biodiversity conservation.

Value of the Data
• Researching ground-dwelling nocturnal fauna presents significant challenges due to its wide-ranging implications in domains such as pesticide use management, crop yield prediction, plant disease identification, and biodiversity enhancement through natural predation [ 1,2 ].
The dataset is open access, providing a valuable resource for future researchers and engineers.• This dataset serves as a foundational tool for training, testing, and validating deep learning algorithms designed to recognize various organisms.

Data Description
In our research, we concentrated on seven prevalent species of ground-dwelling nocturnal fauna: carabid, mouse, opilion, slug, shrew, small-carabid, small-slug, and worm.The core of our study, the BioAuxDataset, comprises 7470 annotated images, i.e. 7470 JPEG images associated with their corresponding 7470 XML annotation files.A notable distinction exists between our field-acquired images and those typically captured in a laboratory setting.While laboratory images often showcase subjects in full detail, including organ visibility, the fauna in our dataset may be obscured by natural vegetation, situated in dimly lit scenes, or partially outside the frame.Field images, originally exceeding 3 MB in size, were downsized to facilitate their use without sacrificing quality.The dataset includes both these reduced-size images and augmented versions, varying significantly in size-from the smallest at 14.7 kilobytes to the largest at 2.7 megabytes.This variety ensures that during the training phase, the deep learning model [ 3,4 ] encounters a broad spectrum of scenarios, enhancing its performance in subsequent detection tasks.
Furthermore, the dataset's utility for deep learning algorithms is enriched by the number of occurrences of individuals per class and the diversity in the sizes of these individuals, which vary widely among different classes and even within the same class depending on the developmental stage.Such diversity is critical for the model's ability to learn effectively.Table 1 in our study details the occurrences of individuals per class, while Table 2 provides insights into the minimum and maximum sizes of individuals by class, including the ratio of an individual's size relative to the image frame, underscoring the dataset's complexity and richness in teaching the model to recognize and differentiate between various organisms.

Experimental devices
To address the unique requirements of field visualization, we engineered a bespoke realtime image capture system leveraging the compact and versatile Raspberry Pi nanocomputer.Our setup incorporated multiple cameras from Bushnell and Berger Et Shröter, strategically positioned across various plots.Each camera, equipped with an SD memory card, was programmed to automatically capture images at fifteen-second intervals.These devices were managed by the Raspberry Pi, a highly portable nanocomputer boasting an ARM microprocessor, 4GB of RAM, a dedicated video card, and Wi-Fi connectivity, facilitating seamless operation in diverse field conditions.This innovative arrangement ensures continuous, detailed monitoring of our study areas, as depicted in Fig. 1 (image on the left), highlighting the system's field adaptability.
An intuitive web interface simplifies the operation of our system.Daily, the SD card's contents are transferred to one of the hard drives on our image server for storage.Each field-captured To ensure these high-resolution images are ready for analysis, they undergo three stages of processing as shown below:

Image resizing
The initial phase of our image processing involved downsizing the raw images to make them more manageable for computational analysis, without compromising the integrity of the information they contained.This resizing is essential because deep learning algorithms require optimized image sizes for efficient processing in memory or on GPUs.Our approach maintained the original aspect ratio of each image to ensure that the proportions of the subjects within them remained consistent relative to each other.To achieve this, we standardized the width of all reduced images to 10 0 0 pixels.Consequently, the size of these adjusted images now ranges from 14.7 to 750 kilobytes, allowing for a balance between detail preservation and computational efficiency.

Image labeling and augmentation
Following the resizing process, we annotated the images using the LabelImg application, a Python-based tool.In situations where the quantity of certain species was insufficient for effective model training, we generated additional images through image augmentation techniques.These techniques were carefully applied to respect the natural movement patterns of the subjects.For instance, while carabids could rotate in any direction, they always remained grounded.Conversely, shrews, slugs, and mice displayed the ability to climb and potentially adopt vertical positions, reflecting their diverse locomotion capabilities.The inherent elasticity of creatures like slugs and worms necessitated the use of image segmentation for duplication, employing tools such as Gimp for image retouching or the OpenCV library for coding solutions.Additionally, we developed scripts to rotate images, ensuring the original annotations were preserved post-transformation.
This annotation phase was meticulous, governed by strict labeling conventions to facilitate precise model training and, consequently, more accurate organism detection.These conventions included: • the complete labeling of visible individuals • the annotation of partially obscured subjects • the crafting of bounding boxes that snugly encapsulate the subjects.Fig. 2 .shows an example of an annotated image.

Temporal dynamics of individuals
For the purposes of testing and validating learning models, we have put online 3 new raw image datasets: "2016-6-30-Bushnell-Chrysope", "2016-7-27-Bushnell-SpidersAnchomenus-Slugs" and "2016-7-13au15-Carnage".Both datasets contain manual annotation xml files and a sub-directory called "automaticannotation" containing automatic annotation xml files created by a learning model.In addition, in each dataset, a CSV file (metadata.csv)summarizes the results of the automatic annotation.This file contains, among other information, the following columns: • dataset identifier • image name • a column for each class (carabid, ..., worm) • a presence/absence indicator • date and time  This CSV file can be queried for outputs such as the count of individuals over a given period ( Fig. 3 ) or the dynamics (temporality) of individuals ( Fig. 4 ).

Limitations
None.

Fig. 1 .
Fig. 1.Experimental field devices.JPEG image boasts dimensions of 3264 by 2448 pixels.Our setup included two experimental devices: • PiScope1 (depicted in Fig. 1 , middle image) is a sophisticated image acquisition tool that, from 2016 to 2019, captured over 70,0 0 0 raw RGB color images and several videos, providing a rich dataset for analysis.• PiScope2 (shown in Fig. 1 , right image) contributed an additional 33,0 0 0 + raw images during the 2017-2018 period.These images, ranging in size from 2 to 3.5 Megabytes, offer detailed insights into the studied phenomena.

Fig. 2 .
Fig. 2.An example of an image annotated with Labelimg, here a slug.The time of acquisition is indicated at the bottom of the image.

Table 1
Number of individuals for the 7 classes of BioAuxDataset.

Table 2
We can see that the ratio size (here, the image size is the product height * width) of carabid, opilion, small-carabid and small-slug varies from 0.02 % to 0.8 % of the image size.